Neural network based system for script identification in Indian documents

نویسندگان

  • S BASAVARAJ
  • N V SUBBAREDDY
  • Subba Reddy
چکیده

The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 × 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 × 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 × 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

International Journal of Applied Science & Technology Research Excellence Vol. 1, Issue 1, Nov-Dec 2011, ISSN NO. 2250 – 2718 (Print), 2250 – 2726 (Online)

In this paper, we present a system towards Indian postal automation based on PIN (Postal Index Number) code. Since India is a multilingual and multi-script country that was earlier colonized by UK, the address part may be written by combination of scripts such as Latin (English) and a local (state) script. Here, we shall consider Oriya script one of the local state language in India with Englis...

متن کامل

A Comparative Analysis of Classifiers Accuracies for Bilingual Printed Documents (Oriya-English)

Bilingual document recognition has been the subject of intensive research and our focus is on the recognition of an Oriya-English bilingual documents. In most of our official papers, school text books, it is observed that English words interspersed within the Indian languages. So there is need for an Optical Character Recognition (OCR) system which can recognize these bilingual documents and st...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Automation of Indian Postal Documents Written in Bangla and English

In this paper, we present a system towards Indian postal automation based on pin-code and city name recognition. Here, at first, using Run Length Smoothing Approach (RLSA), non-text blocks (postal stamp, postal seal, etc.) are detected and using positional information Destination Address Block (DAB) is identified from postal documents. Next, lines and words of the DAB are segmented. In India, t...

متن کامل

Distillation Column Identification Using Artificial Neural Network

  Abstract: In this paper, Artificial Neural Network (ANN) was used for modeling the nonlinear structure of a debutanizer column in a refinery gas process plant. The actual input-output data of the system were measured in order to be used for system identification based on root mean square error (RMSE) minimization approach. It was shown that the designed recurrent neural network is able to pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002